Hindi Handwriting Recognition

Here, we apply CNNs to classify Hindi Handwritten Characters and Digits !

A Multi-Class Balanced Image Classification

Glimpse into our Data

Figure : 36 Hindi Characters

Figure : 1-9 Hindi Digits

Data Preparation

Train-Validation-Test Split

Original Data Total Train Validation Test
character_1 2000 1400 300 300
character_2 2000 1400 300 300
digit_9 2000 1400 300 300
Total 92,000 63,000 13,500 13,500

There are 36 characters and 9 digits resulting in 45 classes. This is a Balanced Multi-class Classification Problem. Based on the above split, we can use batch_size = 250. This results in steps_per_epoch = 252 for training and 54 for validation and testing.

Setting Directory

Code
original_dataset_dir = "C:/Users/KUNAL/Downloads/#R coding/#Books/covered/#Book - Manning - Deep Learning with R and Keras/## Article"

base_dir = "C:/Users/KUNAL/Downloads/#R coding/#Books/covered/#Book - Manning - Deep Learning with R and Keras/## Article/HindiCharacter"

train_dir = file.path(base_dir,"Training")
validation_dir = file.path(base_dir,"Validation")
testing_dir = file.path(base_dir,"Test")

Network Structure

Code
# Defining Hyper parameter
img_height = 32
img_width = 32
batch_size = 250
num_classes = 45 

library(keras)

datagen = image_data_generator(rescale = 1/255)
train_generator <- flow_images_from_directory(train_dir,
                    datagen,target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "categorical", 
                    color_mode = "grayscale")
# 63000 images belonging to 45 classes
val_generator   <- flow_images_from_directory(validation_dir,datagen,
                    target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "categorical", 
                    color_mode = "grayscale")
# 13500 images belonging to 45 classes
test_generator  <- flow_images_from_directory(testing_dir,datagen,
                    target_size = c(img_height, img_width),
                    batch_size = batch_size,
                    class_mode = "categorical",
                    shuffle = F, 
                    color_mode = "grayscale")
# 13500 images belonging to 45 classes

# Model Structure
model <- keras_model_sequential() %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu", 
                input_shape = c(img_height, img_width, 1)) %>%
  layer_max_pooling_2d(pool_size = c(2, 2),strides = c(2,2),padding = "same") %>%
  layer_conv_2d(filters = 32, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2),strides = c(2,2),padding = "same") %>%
  layer_conv_2d(filters = 64, kernel_size = c(3, 3), activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2, 2),strides = c(2,2),padding = "same") %>%
  layer_flatten() %>%
  layer_dropout(rate = 0.5) %>% 
  layer_dense(units = 64, activation = "relu") %>% 
  layer_dense(units = num_classes, activation = "softmax")

summary(model)

Network Structure

Compile, Train & Evaluate

Code
# Compile
model %>% compile(loss="categorical_crossentropy",
                  optimizer=optimizer_adam(learning_rate = 0.001), 
                  metrics=c("acc"))
# Callback
filepath = file.path(original_dataset_dir,"Hindi_final_model.h5")

check = callback_model_checkpoint(filepath,monitor = 'val_acc',
                                  verbose = 1,
                                  save_best_only = T,
                                  mode = "max")

early  = callback_early_stopping(monitor = "val_loss",
                                 mode = "min",
                                 patience = 3)

lr_red = callback_reduce_lr_on_plateau(monitor = "val_loss",
                                       patience = 2,
                                       verbose = 1, 
                                       factor = 0.3,
                                       min_lr = 0.000001)

callback_list = list(early,lr_red,check)

# Train
history <- model %>% fit_generator(train_generator,
                                   steps_per_epoch = 252,
                                   epochs = 40,
                                   callbacks = callback_list,
                                   validation_data = val_generator,
                                   validation_steps = 54)

Callbacks used -

  • Model Checkpoint - will save the model when validation accuracy is improving.
  • Early Stopping - will stop the model training if there is no reduction of validation loss over next 3 epochs.
  • Reduce LR on Plateau - will reduce learning rate by a factor of 0.3 when there is no improvement in validation loss over next 2 epochs.

Training Logs

Training Accuracy Plots

Code
# Evaluate
model %>% evaluate(test_generator, steps = 54)
# 54/54 [==============================] - 7s
# 138ms/step - loss: 0.0737 - acc: 0.9784

Visualising our Network

Figure : Character 4 (left) and Digit 6 (right)

Figure : Character 4 (left) and Digit 6 (right)

activations_1_conv2d_19 - Character_4

activations_1_conv2d_19 - Character_4

activations_1_conv2d_19 - Digit_6

activations_1_conv2d_19 - Digit_6

activations_2_max_pooling2d_19 - Character_4

activations_2_max_pooling2d_19 - Character_4

activations_2_max_pooling2d_19 - Digit_6

activations_2_max_pooling2d_19 - Digit_6

activations_3_conv2d_18 - Character_4

activations_3_conv2d_18 - Character_4

activations_3_conv2d_18 - Digit_6

activations_3_conv2d_18 - Digit_6

activations_4_max_pooling2d_18 - Character_4

activations_4_max_pooling2d_18 - Character_4

activations_4_max_pooling2d_18 - Digit_6

activations_4_max_pooling2d_18 - Digit_6

activations_5_conv2d_17 - Character_4

activations_5_conv2d_17 - Character_4

activations_5_conv2d_17 - Digit_6

activations_5_conv2d_17 - Digit_6

activations_6_max_pooling2d_17 - Character_4

activations_6_max_pooling2d_17 - Character_4

activations_6_max_pooling2d_17 - Digit_6

activations_6_max_pooling2d_17 - Digit_6

As we can see that the first layer seems to act as an ‘edge-detector’ picking up the structure of characters and digits. As we move towards the higher layers, the representations start becoming more and more abstract. We also see that there are spaces where there were no activations at all - indicating the absence of certain filters.

Structural Experimenting

Sl. Structures Training Validation Testing Remarks
1

(32,64 pool: 2,5 stride: 2,5)

without dropout layer

97.66% 91.24% 95.49% Overfitting
2

(32,64 pool: 2,5 stride: 2,5)

with 50% dropout

91.18% 93.48% 96.41%

epochs = 10/40,

lr = 0.0001

3

(32,64 pool: 2,5 stride: 2,5)

with reduced learning rate

94.36% 94.87% 97.39%

epochs = 12/40,

lr = 0.00009

4

(32,32,64 pool: 2,2,5 stride: 2,2,5)

with 50% dropout

93.25% 94.62% 97.2%

epochs = 25/40,

lr = 0.0003

5

(32,32,64 pool: 2,2,2 stride: 2,2,2)

with 50% dropout

96.28% 94.33% 97.59%

epochs = 19/40,

lr = 0.000027

6

(32,64 pool: 2,2,2 stride: 2,2,2)

with 50% dropout

97.70% 93.19% 96.86%

epochs = 16/40,

lr = 0.00009

7 32,32,64 - kernel 3,3,3 - pool 2,2,2 - stride 2,2,2 - classifier 64 - 50% dropout *** 96.64% 95.07% 97.84%

epochs = 22/40,

lr = 0.000027

8 32,32,64 - kernel 5,5,5 - pool 2,2,5 - stride 2,2,5 - classifier 64 - 50% dropout 88.31% 92.44% 95.68%

epochs = 17/40,

lr = 0.00009

We see that Model 7 with 3 convolution layers with filters 32,32,64 kernel 3x3, pool size 2x2 with strides of 2x2 for all layers and 50% dropout layer followed by a dense classifier with 64 neurons is the best performing model with Test Accuracy = 97.84%.

7th Model achieves a 97.84 % Test Accuracy !